12 research outputs found

    Hierarchical Imitation Learning for Stochastic Environments

    Full text link
    Many applications of imitation learning require the agent to generate the full distribution of behaviour observed in the training data. For example, to evaluate the safety of autonomous vehicles in simulation, accurate and diverse behaviour models of other road users are paramount. Existing methods that improve this distributional realism typically rely on hierarchical policies. These condition the policy on types such as goals or personas that give rise to multi-modal behaviour. However, such methods are often inappropriate for stochastic environments where the agent must also react to external factors: because agent types are inferred from the observed future trajectory during training, these environments require that the contributions of internal and external factors to the agent behaviour are disentangled and only internal factors, i.e., those under the agent's control, are encoded in the type. Encoding future information about external factors leads to inappropriate agent reactions during testing, when the future is unknown and types must be drawn independently from the actual future. We formalize this challenge as distribution shift in the conditional distribution of agent types under environmental stochasticity. We propose Robust Type Conditioning (RTC), which eliminates this shift with adversarial training under randomly sampled types. Experiments on two domains, including the large-scale Waymo Open Motion Dataset, show improved distributional realism while maintaining or improving task performance compared to state-of-the-art baselines.Comment: Published at IROS'2

    VizWiz

    Get PDF
    The lack of access to visual information like text labels, icons,and colors can cause frustration and decrease independence for blind people. Current access technology uses automatic approaches to address some problems in this space, but the technology is error-prone, limited in scope, and quite expensive. In this paper, we introduce VizWiz, a talking application for mobile phones that offers a new alternative to answering visual questions in nearly real-time—asking multiple people on the web. To support answering questions quickly, we introduce a general approach for intelligently recruiting human workers in advance called quikTurkit so that workers are available when new questions arrive. A field deployment with 11 blind participants illustrates that blind people can effectively use VizWiz to cheaply answer questions in their everyday lives, highlighting issues that automatic approaches will need to address to be useful. Finally, we illustrate the potential of using VizWiz as part of the participatory design of advanced tools by using it to build and evaluate VizWiz::LocateIt, an interactive mobile tool that helps blind people solve general visual search problems

    Waymax: An Accelerated, Data-Driven Simulator for Large-Scale Autonomous Driving Research

    Full text link
    Simulation is an essential tool to develop and benchmark autonomous vehicle planning software in a safe and cost-effective manner. However, realistic simulation requires accurate modeling of nuanced and complex multi-agent interactive behaviors. To address these challenges, we introduce Waymax, a new data-driven simulator for autonomous driving in multi-agent scenes, designed for large-scale simulation and testing. Waymax uses publicly-released, real-world driving data (e.g., the Waymo Open Motion Dataset) to initialize or play back a diverse set of multi-agent simulated scenarios. It runs entirely on hardware accelerators such as TPUs/GPUs and supports in-graph simulation for training, making it suitable for modern large-scale, distributed machine learning workflows. To support online training and evaluation, Waymax includes several learned and hard-coded behavior models that allow for realistic interaction within simulation. To supplement Waymax, we benchmark a suite of popular imitation and reinforcement learning algorithms with ablation studies on different design decisions, where we highlight the effectiveness of routes as guidance for planning agents and the ability of RL to overfit against simulated agents

    Using FPGAs to perform embedded image registration

    No full text
    Image registration is the process of relating the intensity values of one image to another image using their pixel c~?tent alone. An example use of this technique is to create panoramas from individual images taken froin a rotating camera. A class of image registration algorithms, known as direct registration methods, uses intensity derivatives to iteratively estimate the parameters modeling the transformation between the images. Direct methods are known for their sub-pixel accurate results; however, their execution is computationally expensive, often times preventing use in an embedded capacity like those encountered in small UIUllann~d aerial vehicle or mobile phone applications. In this work, a high performance FPGA-based direct affine image registration core is presented. The proposed method combines two features: a fully pipelined architecture to compute the linear system of equations, and a Gaussian elimination module, implemented as a finite state machine, to solve the resulting linear system. The design is implemented on a Xilinx ML506 development board featuring a Virtex-5 SX50 FPGA, zero bus turn-around (ZBT) RAM, and VGA input. Experimentation is performed on both real and synthetic data. The registration core performs in excess of 80 frames per second on 640x480 images using one registration iteration

    Automatically tuning background subtraction parameters using particle swarm optimization

    No full text
    A common trait of background subtraction algorithms is that they have learning rates, thresholds, and initial values that are hand-tuned for a scenario in order to produce the desired subtraction result; however, the need to tune these parameters makes it difficult to use stateof-the-art methods, fuse multiple methods, and choose an algorithm based on the current application as it requires the end-user to become proficient in tuning a new parameter set. The proposed solution is to automate this task by using a Particle Swarm Optimization (PSO) algorithm to maximize a fitness function compared to provided ground-truth images. The fitness function used is the Fmeasure, which is the harmonic mean of recall and precision. This method reduces the total pixel error of the Mixture of Gaussians background subtraction algorithm by more than 50 % on the diverse Wallflower data-set

    Analyzing Team Actions With Cascading Hmm

    No full text
    While team action recognition has a relatively extended literature, less attention has been given to the detailed realtime analysis of the internal structure of the team actions. This includes recognizing the current state of the action, predicting the next state, recognizing deviations from the standard action model, and handling ambiguous cases. The underlying probabilistic reasoning model has a major impact on the type of data it can extract, its accuracy, and the computational cost of the reasoning process. In this paper we are using Cascading Hidden Markov Models (CHMM) to analyze Bounding Overwatch, an important team action in military tactics. The team action is represented in the CHMM as a plan tree. Starting from real-world recorded data, we identify the sub- teams through clustering and extract team oriented discrete features. In an experimental study, we investigate whether the better scalability and the more structured information provided by the CHMM comes with an unacceptable cost in accuracy. We find that a properly parametrized CHMM estimating the current goal chain of the Bounding Overwatch plan tree comes very close to a flat HMM estimating only the overall Bounding Overwatch state (a subset of the goal chain) at a respective overall state accuracy of 95% vs 98%, making the CHMM a good candidate for deployed systems. Copyright © 2009, Assocation for the Advancement of ArtdicaI Intelligence (www.aaai.org). All rights reserved

    Person And Vehicle Tracking In Surveillance Video

    No full text
    This evaluation for person and vehicle tracking in surveillance presented some new challenges. The dataset was large and very high-quality, but with difficult scene properties involving illumination changes, unusual lighting conditions, and complicated occlusion of objects. Since this is a well-researched scenario [1], our submission was based primarily on our existing projects for automated object detection and tracking in surveillance. We also added several new features that are practical improvements for handling the difficulties of this dataset. © 2008 Springer-Verlag Berlin Heidelberg

    VizWiz::LocateIt - enabling blind people to locate objects in their environment

    No full text
    Blind people face a number of challenges when interacting with their environments because so much information is encoded visually. Text is pervasively used to label objects, colors carry special significance, and items can easily become lost in surroundings that cannot be quickly scanned. Many tools seek to help blind people solve these problems by enabling them to query for additional information, such as color or text shown on the object. In this paper we argue that many useful problems may be better solved by direclty modeling them as search problems, and present a solution called VizWiz::LocateIt that directly supports this type of interaction.VizWiz::LocateIt enables blind people to take a picture and ask for assistance in finding a specific object. The request is first forwarded to remote workers who outline the object, enabling efficient and accurate automatic computer vision to guide users interactively from their existing cellphones. A two-stage algorithm is presented that uses this information to guide users to the appropriate object interactively from their phone. © 2010 IEEE
    corecore